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The advent of statistical software with powerful graphical 
and modeling capabilities has revolutionized the manner in 
which pharmacokinetic and pharmacodynamic analyses are 
performed. Knowledge discovery from a large (population) 
pharmacokinetic data set incorporates ail steps taken from 
data assembly to the development of a population phar¬ 
macokinetic model and the communication of the results 
thereof. The process can be formalized into a number of 
steps: (1 J creation of a data set for pharmacokinetic knowl¬ 
edge discovery, (2) data quality analysis, (3) data structure 
analysis (exploratory examination of raw data), (4) determi¬ 
nation of the basic pharmacokinetic model that best de¬ 
scribes the data and generating post hoc empiric indhidual 
Bayesian parameter estimates, (5) the search for patterns and 
relationships between parameters and parameters and 
covariates by visualization, (ti) the use of modem statistical 


modeling techniques for data structure revelation and 
covariate selection, (7) consolidation of the discovered 
knowledge into irreducible form (i.e., developing a popula¬ 
tion pharmacokinetic model), (8) the determination of model 
robustness (determination of the reliability of model parame¬ 
ter estimates), and (9) the communication and integration of 
the discovered pharmacokinetic knowledge. This process is 
discussed, and a motivating example is presented. The use of 
modern graphical, modeling, and statistical techniques for 
knowledge discovery from large pharmacokinetic data sets 
has given the data analyst the freedom to choose statistical 
methodology appropriate to the problem at hand with the 
maximization of information extraction, rather than on the 
basis of mathematical/statistical tractability. 
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P opulation pharmacokinetics is the study of the 
sources and correlates of variability in plasma drug 
concentrations between individuals who are prefera¬ 
bly die target patient population receiving clinically 
relevant doses of a drug of interest . 1 Certain patient 
pathophysiological features, such as body weight, ex¬ 
cretory and metabolic functions, and the presence of 
other therapies, can regularly alter dose-concentration 
relationships. For example, renal failure usually causes 
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steady-state drug concentrations to be greater than 
those in patients with normal renal function receiving 
the same dosage of a drug eliminated mostly by the 
kidney. Population pharmacokinetics seeks to iden¬ 
tify the measurable pathophysiologic factors that cause 
changes in the dose-concentration relationship and the 
extent of these changes so that if such changes are asso¬ 
ciated with clinically significant shifts in the therapeu¬ 
tic index (i.e., safety margin), dosage can be appropri¬ 
ately modified. 

For years, most pharmacokinaticists have devel¬ 
oped population pharmacokinetic models for drugs 
without a careful exploratory examination and model¬ 
ing of the multivariate large (population) pharmaco¬ 
kinetic data set. Population pharmacokinetic modeling 
in itself is a process of knowledge discovery from a 
population pharmacokinetic data set. Knowledge dis¬ 
covery in this sense is broader than the narrow defin¬ 
ition of knowledge discovery as being the search for 
relationships and global patterns that exist in large da- 
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tabases but are “hidden” among the vast amounts of 
data. 2 Knowledge discovery from a population phar¬ 
macokinetic data set incorporates all steps taken from 
data assembly to the development of a population 
pharmacokinetic model. 

The advent of modern statistical software such as 
S-Plus'" 3 with its powerful graphical and modeling ca¬ 
pabilities has revolutionized the manner in which 
pharmacokinetic and pharmacodynamic analyses are 
performed. Careful exploratory examination and mod¬ 
eling of data (i.e., exploratory data analysis [EDA]] are 
fundamental steps in the development of a population 
pharmacokinetic model. 4 The EDA is an essential pre¬ 
cursor to and interwoven with the nonlinear mixed- 
effects modeling step in that it can reveal information 
patterns and relationships or special structure in the 
data. Such information can expedite the selection of 
appropriate modeling steps with a reduction in the 
time spent in developing population pharmacokinetic 
models. Thus, the knowledge discovery process can be 
facilitated with EDA. 

Most classical statistical procedures or types of soft¬ 
ware are based, either implicitly or explicitly, on assump¬ 
tions about data, and the validity of the analyses de¬ 
pends on the validity of the assumptions. Good data-visu- 
aiization tools provide powei-ful diagnostic mechanisms 
for confirming assumptions or, when the assumptions 
are not met, for suggesting corrective action. 4 

One of the most difficult tasks for a pharmacometri- 
cian/pharmacoldneticist is to convey findings from 
pharmacostatistical analyses to clinicians and other 
members of the drug development team. Failure to 
communicate these findings successfully puts at risk 
all his or her data analysis efforts, irrespective of its 
quality. The use of high-quality graphics can effec¬ 
tively enhance the communication of pharma- 
cometric knowledge to a medical research team. 
Graphics, in particular, are essential for conveying re¬ 
lations and trends in an informal and simplified vi¬ 
sual form. 

The remainder of the article is divided into the fol¬ 
lowing sections: a process of knowledge discovery 
from a large (population) pharmacokinetic data set, a 
motivating example of the implementation of the 
pharmacokinetic knowledge discovery process, dis¬ 
cussion, and concluding remarks. 

THE PHARMACOKINETIC KNOWLEDGE 
DISCOVERY PROCESS 

The purpose of data analysis and interpretation, in gen¬ 
eral, is to find out, among other things, meaningful pat¬ 


terns and relationships between variables under con¬ 
sideration. It is analysis that transforms data into useful 
information. 5 In other words, the purpose of a popula¬ 
tion pharmacokinetic data analysis is to extract knowl¬ 
edge available in the population pharmacokinetic data 
set. Sufficient knowledge of the problem investigated 
and knowledge of mathematics and statistics are essen¬ 
tial criteria for a successful pharmacometric evaluation 
of a population pharmacokinetic data set. Significant 
untapped knowledge often lies hidden in a papulation 
pharmacokinetic data set. Knowledge discovery is tire 
nontrivial process of identifying valid, novel, poten¬ 
tially useful, and ultimately understandable patterns in 
data. 2 The challenge of knowledge discovery in a popu¬ 
lation pharmacokinetic data set is to make effective use 
of the data set to discover the untapped knowledge that 
lies hidden therein. Specifically, it is the task of imple¬ 
menting (and developing) methodologies that can lead 
to the discovery of interesting, useful patterns and re¬ 
lationships that can he used (1) to support mission- 
critical decision making in drug development or drug 
therapy, (2) for predictions, (3) for the explanation of 
variability and making dosage recommendations, or (4) 
for relating pharmacokinetics to pharmacodynamics, 
efficacy, and safety. These constitute some of the objec¬ 
tives of population modeling. 

Knowledge discovery is an emerging interdisci¬ 
plinary research field that lives at the intersection of com¬ 
puter science (database, artificial intelligence, graphics, 
and visualization), statistics, and several application 
domains such as clinical pharmacology in general and 
pharmacometrics in particular. Knowledge discovery 
in a large pharmacokinetic data set is a process that can 
be formalized into a number of steps. Briefly, these 
steps are as follows: 

(1) creating a data set on which knowledge will be 
performed; 

(2) cleaning and processing the data (i.e., data quality 
analysis); 6,7 

(3) data structure analysis—exploratory examination of 
raw data (concentrations and covariates) for hidden 
structure and the reduction of the dimensionality of 
the covariate vector; 

(4) determining the basic pharmacokinetic model that 
best describes the data and generating post hoc em¬ 
piric individual Bayesian parameter estimates; 

(5) searching for patterns and relationships between pa¬ 
rameters, and parameters and covariates through 
graphical displays and visualization; 

(6) using modem statistical modeling techniques such as 
multiple linear regression (MLR), generalized additive 
modeling (GAM), 8 and tree-based modeling (TBM) to 
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reveal structure in the data and initially select explan¬ 
atory covariates; 

(7) consolidating the discovered knowledge in (6) into ir¬ 
reducible form (i.e., developing a population pharma¬ 
cokinetic model using tire nonlinear mixed-effects 
modeling approach); 

(8) determining model robustness through sensitivity 
analysis, examination of parametric/nonparametric 
standard errors, or stability, determination, and pre¬ 
dictive performance; and 

(9) communicating and integrating the discovered 
pharmacokinetic knowledge. 

Generally, not much attention has been given to data 
tality analysis or structure revelation in a population 
larmacokinetic data set that could provide a link be- 
r een the data set and the analytical path chosen for 
Larmacokinetic knowledge discovery, A lack of these 
alyses can result in the production of biased popula- 
sn pharmacokinetic parameter estimates. To avoid 
is caused by a nonsystematic data analysis, the above 
uctured approach has been proposed. 

Aspects of data quality that need to be analyzed are 
rrectness and completeness. Correctness of concen- 
tion-time data relative to dosing history and co- 
riate information can be checked by comparing the 
:ords in the population pharmacokinetic data set 
th information in the case report forms, and this can 
done by using a sample of the records. Completeness 
population pharmacokinetic (and pharmacody- 
mic) data records is a property that cannot be satis - 
d due to many reasons such as omissions when re- 
:ding or inputting data into a clinical database or the 
ilfunctioning of medical equipment. 9 To prepare 
fca for population pharmacokinetic knowledge dis- 
/ery, some imputation of data may be done, and 
;re are different procedures available for handling 
ssing data. 6 - 10,11 

Data structure analysis is the examination of the raw 
:a for “hidden' 1 structure, outliers, or leverage ohser- 
ions. This is further repeated during the exploratory 
ideling (and nonlinear mixed-effects modeling) 
ps using case deletion diagnostics. 4 This type of 
dysis is important since outliers or leverage observa- 
ns may occur in a population pharmacokinetic data 

The knowledge discovery basis of population phar- 
cokinetic modeling permits the generation of hy- 
hesis from the relationship discovered during data 
icture analysis, which can be further tested in the 
rlinear mixed-effects modeling step. It can also sug- 
t a testable hypothesis that can be independently 
:ed via traditional confirmatory experiments and 
ilysis. 

>ULAT!ON PHARMACOKINETICS 


A MOTIVATING EXAMPLE 

Data from 138 subjects (comprising both sexes with 
ages ranging from 5 to 81 years and weighing between 
17 and 96 kg) who were administered different doses of 
a drug under development were pooled from six stud¬ 
ies for the development of a population pharmacoki¬ 
netic model (see Ette and Ludden 4 ). 

A structured approach to the development of a pop¬ 
ulation pharmacokinetic model proposed in the pre¬ 
ceding section was used. 4 This was done to uncover 
"hidden” structure in the multivariable data that in¬ 
formed the analysis path taken to develop a population 
pharmacokinetic model. After ensuring data quality, 
the steps taken to develop the population pharmacoki¬ 
netic model were the following; 

1. data structure analysis (i.e., exploratory examination 
of raw concentration data, distributions, and correla¬ 
tions between covariates); 

2. determination of a basic pharmacokinetic model using 
the NONMEM program and generation of empiric 
Bayesian individual parameter estimates obtained for 
subsequent use in searching for patterns and relation¬ 
ships between parameters and covariates; 

3. use of empiric Bayesian estimates generated in (2) 
with covariate data in MLR with case deletion diag¬ 
nostics, GAM, and TBM for the initial selection of 
covariates and further revelation of "hidden” structure 
in the data; 

4. consolidating the knowledge discovered in step (3) 
into an irreducible (final) form through nonlinear 
mixed-effects modeling to develop the population 
pharmacokinetic model; 

5. the determination of the reliability of parameter esti¬ 
mates using the jackknife technique; and 

6. communication of the discovered knowledge. 

Exploratory Examination of Raw Concentration 
Data, Distributions, and Correlations 
between Covariates 

The concentration data from each of the studies were 
plotted and examined independently of each other for 
clustering of observations using a “spaghetti” (profile) 
plot and later pooled and the plot repeated. Figure 1 
shows a clustering of observations into two groups in 
the terminal elimination phase of the plasma concen¬ 
tration-tune profile. 

A slight bimodality in creatinine clearance (CRCL) is 
observed in Figure 2. This was confirmed with formal 
testing using Shapiro-Wilk’s test. 12 Figure 2 summa¬ 
rizes the distributions of continuous covariates in the 
population pharmacokinetic data set. 4 
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| Time (h) _J 

Figure 1. " Spaghetti " plot of individual plasma profiles (in clus¬ 

ters} from a population pharmacokinetic data set of a drug under 
development. 


(a) (b) (c) 



Figure 2. Frequency distribution of demographic variables: (a) 
creatinine clearance (CRCL), (b) weight (WT), and (c) age. (Extracted 
from Ette and Ludden't 


The correlations of weight and age, CRCL and age/ 
weight were formally tested. Although weight was sig¬ 
nificantly correlated with age (p < 0.0001), the correla¬ 


tion coefficient was 0.34. Similarly, creatinine cli 
ance was significantly correlated with age (p < Q.00 1 
but the correlation coefficient was 0.29. Creatin 
clearance was not correlated with weight (r = 0.0 
Relying only on numerical summaries would im 
taking by faith that the salient features of the relati 
ship between the variables have been captured. Wi 
one or a few numerical summaries are used to cha: 
terize the relationship between two variables, there 
risk of missing important features or, worse, of b« 
misled. 13 Thus, a scatterplot matrix was used to si 
marize relationships between covariables (Figure 
With the weak correlations observed between cov 
ates, the dimensionality of the covariate vector was 
reduced; hence, all covariates were used in the sul 
quent exploratory modeling step, 

Structure Revelation and Covariate Selection 

NONMEM 14 was used to select a pharmacokin 
model (i.e., a two-compartment model) that best 
scribed the data. With this model, post hoc emp 
Bayesian individual parameter estimates were gej 
ated and used with the covariates as “data” in the 
ploratory modeling step for covariate selection 
data structure revelation. For the purpose of Oris ; 
cle, the focus is on the development of a populat 
pharmacokinetic model for clearance (CL). 

A density plot (Figure 4) of CL estimates with 
band-width computed using Silverman’s rule of thur 
clearly shows the presence of two subpopulation 
the data set. The bimodality in the distribution of ij 
viduai CL estimates suggested the presence of 
subpopulations. Figure 5 is a pairs plot of the pop 
tion pharmacokinetic data set showing the relations 
between individual Bayesian CL estimates and sut 
demographics (covariates). An S-Plus™ function 
used to produce the content of each scatterplot \ 
loess regression curves superimposed to aid the vi; 
perception of the relationship between the varial 
This figure can easily be appreciated by dividing it 
two triangular halves—an upper triangle to the r 
and a lower triangle to the left. For example, visuali: 
the relationship between CL (ordinate) and CRCL 
scissa) would necessitate using the triangle to the ri 
It can be observed that CL is related to CRCL by si 
nonlinear function. 

Using the box plot, the relationship between CL 
age divided into groups of 10 years was examine 
can be seen in Figure 6a that CL does not appear to 
function of age group since the CL values are simila 
tween age groups. Similarly, sex does not influenct 
CL of this drug (Figure 6b). The distribution of CL 
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Figure 3. Pairs plot showing re¬ 
lationship between covariotes . 


1 



Figure 4. Density plot showing bimodality in the distribution of em¬ 
piric Bayesian estimates of clearance (CL). The bandwidth for the 
density pint was determined using Silverman's rule of thumb.' 1 

ues between the sexes is similar. Therefore, it can be in¬ 
ferred from these plots that age and sex axe probably not 
predictors of CL. 

Both MLR using the general linear modeling (GLM) 
procedure and GAM*' 8 in S-Plus were used to model 
he data. 


3 OPULATION PHARMACOKINETICS 


Usually, the multiple linear regression approach is 
the method of choice for this type of a problem. The 
multiple linear regression model is expressed in the 
following form: 

P=a+£ p, X j+ e, (1J 

3-i 

where E(e) = 0 and var(e) = o 2 . This model makes a 
strong assumption of the linear dependence of E(P) (the 
expectation of P or mean response) on the predictors. 
The MLR model is extremely useful and convenient if 
this assumption holds, even roughly, because it provides 
a description of the data, summarizes the contribution of 
each predictor with a single coefficient, and provides a 
simple method for predicting new observations. 

The assumption of linear dependence of the re¬ 
sponse variable on each of the predictors may not al¬ 
ways hold. For many types of data, a change in the 
mean of the response variable is accompanied by a 
change in its variance. The GAM approach presents 
a general perspective for the handling of covariates 
in a multiple-regression setting. The linear form of a 
+ Zj pj X j is replaced with the additive form a+Zj/ s (X p, 
that is, 

P=a+£/i (X,)+e, (2) 

) = 1 
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where /i(X ; ) is an arbitrary univariate function that is 
eitiier a linear function or a smoothing spline. Since 
each covariate is represented separately in equation (2), 
GAM retains the important interpretive feature of the 
linear model: the variation of the fitted response sur¬ 
face holding all but one predictor fixed does not de¬ 
pend on the values of the other predictors. In practice, 
this means that once the additive model is fitted to the 
data, one can plot the p-coordinate functions sepa¬ 
rately to examine the roles of predictors in modeling 
the response. The estimated function forms of GAM are 
analogues of the coefficients in multiple linear regres¬ 
sion. Thus, separate functions are introduced to allow 
for nonlinearity and heterogeneous variances. This is 
closer to a reparameterization of the model than to a 
reexpression of the response. Generalized additive 
models are a group of models that are as tractable as the 
linear model is but do not force the data into unnatural 
scales. 

For both MLR and GAM, the data were subjected to 
stepwise [single-term addition/deletion) modeling 
procedure. Each covariate was allowed to enter the 
model in any of several functional representations. The 
Akaike information criterion (AIC) was used as the 
model selection criterion. AIC is a criterion for judging 
model quality. At each step, the model is changed by 
addition or deletion of a covariate that results in the 
largest decrease in AIC. The search was stopped when 
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Figure 5. Pairs plot showing re¬ 
lationship between CL and the 
covariables. 


the AIC reached a minimum value. In the building of 
the GAM model, each covariate was allowed to enter 
the model linearly or as a spline function. The scatter- 
plot smoothing function in S-Plus was used. 

Cook's distance diagnostic was applied to the results 
of the MLR, and it did not reveal any significant lever¬ 
age observations in the data. Cook’s distance diagnostic 
measures the effect of the ith individual (observation) 
on the coefficient vector. 

The partial residual plots in Figure 7 summarize the 
results of the MLR and GAM approaches. The inade¬ 
quacy of the linear model relating CRCL to CL is obvi¬ 
ous with the poor prediction of subjects whose CRCL 
values were below 55 ml/min (Figure 7a). There are 
two distinguishable clouds of points above and below 
the CRCL value of 55 ml/min. The data points above 
the 55 ml/min CRCL value are scattered randomly 
around the regression line, while those below are not. 
The latter are poorly predicted. These two groups of 
points represent two subpopulations seen in Figure lb. 
The partial residual plots reveal the inadequacy of the 
linear model. 

The relationship between CL and age is flat, more or 
less (Figure 7b). This confirmed the finding from the 
box plots (Figure 6a) that CL was not a function of age. 
Both modeling procedures (i.e., MLR and GAM) did 
not select weight and sex as predictors of CL. The find¬ 
ing that sex was not a predictor of CL served as a confir- 
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re O', (a) Box plots of the distribution of empiric Bayesian CL es- 
tes by age groups. The age groups were created at intervals of 10 
5 . The minimum age was 5 years and the maximum 81 years, (b) 
slots of the distribution of empiric Bayesian CLestimates by sex. 


ion of the result of the exploratory graphical analy- 
with the box plot (Figure 6b). 

'he results of the GAM model in Figure 7c show that 
]L as a nonlinear predictor of CL was better than 
linear function of the MLR. The detection of non- 
arity, however, is the central motivation for partial 
dual plots. Although a nonlinear function (Figure 
described the data better than the linear function 
Figure 7a), it was still inadequate. The bulk of the 
ects with low CRCL values were still poorly pie- 
ed. This informed the testing in NONMEM of a 
linear model of CRCL as a predictor of CL and two 
irate linear models for the two subpopulations as 
rmed by the distribution of CL and CRCL values 
the GAM results. The relationship between the 
ial residuals and age is flat for the majority of the 
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Figure 7. MLR and GAM analyses results: scatterplots of partial re¬ 
siduals of CL (Uh ) versus (a) CRCL (ml/min)and (b) age lyears)from 
MLR analysis, CL (Uh) versus (c) CRCL (ml/min), and (d) age (years) 
from GAM analysis. The same scale is used for the ordinate in each 
plot so that the relative importance of each covariata can be com¬ 
pared. (Extracted from Ette and Ludden') 


data. Only a few observations appear to skew the values 
at extremes of age (Figure 7 d) C 

TBM is an exploratory modeling technique for un¬ 
covering structure in the data and assessing the ade¬ 
quacy of linear models. 446 It operates only on ranks of 
the data. This aspect of TBM for a numeric explanatory 
variable renders it invariant to monotone transforma¬ 
tions of the explanatory variable. It automatically in¬ 
corporates interactions between covariates such as 
when one parameter-covariate relationship depends 
on another covariate. These are important advantages 
that TBM has over GAM. The result of TBM is shown in 
Figure 8. At each node, every covariate is checked for a 
possible split point where the variances of the two sub¬ 
sets that would result from the split axe most different. 
That is, the split is carried out at the point where the ob¬ 
jective function value (the deviance that in the usual 
case is just the sum of squares) decreases the most. As 
implemented in S-Plus, this process is continued in 
each branch of the split until the terminal contains 10 
or less data points. CRCL was the most significant pre¬ 
dictor of CL. The numeric value indicated at each split 
node is the breakpoint value of the covariate used for 
the split at that node. For instance, 44.8 ml/min CRCL 
value was used at the root node for the split. The values 
displayed at the leaves (terminals of the splits) are the 
average CL values for the homogeneous group of sub¬ 
jects selected into the particular portion of the split. 
The histograms of the splits for CL and CRCL in the or¬ 
dinate direction show splits of CL almost matching 
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Figure 8. A tree-based model far 
CL. The top panel displays the la¬ 
beled dengrogram. The numeric 
value indicated at each split node 
is the breakpoint value of the 
covariate used for the split at that 
node. The values displayed at the 
leaves (terminals of the splits1 are 
the average CL values for the ho¬ 
mogeneous group of subjects se¬ 
lected into the particular portion 
of the split. The lower panel dis¬ 
plays a side-by-side histogramfor 
CL (L/h), CFtCL (ml/min), age 
(years), and IVT(kg). The left-side 
histogram summarizes the obser¬ 
vations following the left split and 
similarly for the right split. (Ex¬ 
tracted from Ette and Ludden') 


those of CRCL (Figure 8), suggesting two levels of re¬ 
sponse by two subpopulations- 4 

NONMEM Modeling to Develop the 
Population Pharmacokinetic Model 

The PPK model for CL of this drug was developed in 
NONMEM by backward elimination using the covari- 
ates CRCL and age. The final NONMEM PPK model 
only used CRCL as a predictor of CL with different 
slopes and intercepts to characterize the subpopula¬ 
tions. 3 The models estimating CL in the two subpopula¬ 
tions were the following: 

Model I: CL, (L/h) = 1.0 (20.0%) + 0.25 (44.0%) • CRCL for 
individuals with a shallow slope of the CL regression 
model (subpopulation I). 

Model II: CL* (L/h) = 1.0 (20.0%) • 0.27 (35.0%) + 0.25 
(44.0%) • 4.43 (45.0%) • CRCL for the group of indi¬ 
viduals with a steep slope of the CL regression model 
(subpopulation II). 

The data in parentheses represent the percent relative 
standard error, a measure of precision associated with 
the estimation of these parameters. The models above 
were used for recommending doses for patients in the 
two subpopulations. 
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The result obtained from NONMEM modeling was 
predicted from exploratory examination and modeling 
of the data and confirms the usefulness of exploratory 
data examination and modeling in the discovery of 
knowledge from a large pharmacokinetic data set. 

Reliability of Estimates 

The reliability of the parameter estimates was checked 
using a nonparametric technique—the jackknife tech¬ 
nique. 4 The nonlinearity of a statistical model and ill 
conditioning of a given problem can produce numeri¬ 
cal difficulties and force an estimation algorithm into a 
false minimum. 

The preciseness of the primary parameters can he es¬ 
timated from the final fit of the multiexponential func¬ 
tion to the data, but they are of doubtful validity if the 
model is severely nonlinear, 17 The preciseness of tire 
secondary parameters (in this case, variability) is likely 
to be even less reliable. Consequently, the results of sta¬ 
tistical tests carried out with preciseness estimated 
from the final fit could easily be misleading—thus the 
need to assess estimates’ reliability. The first-order 
method in NONMEM, which was used for the develop¬ 
ment of the population pharmacokinetic model, can 
yield estimates of parameters that are sometimes bi- 
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Table I Comparison of Normal Theory Parameter and Variance Estimates with Jackknife Estimators 


Parameter 


Normal Theory 



Jackknife 3 


Estimate 

SE 

%RSE 

Estimate 

SE 

%JKK SE 

0, 

0.990 

0.200 

20.200 

0.780 

0.075 

9.615 

02 

0.270 

0.090 

33.330 

0.910 

0.335 

36.813 

03 

0.250 

0.110 

44.000 

0.260 

0.055 

21.150 


4.430 

2.000 

45.460 

3.513 

0.347 

9.787 


a. A total of 10% of subjects were removed/iun. 6 t is the intercept for the regression of CRCL as a predictor of CL (i.e., CLj) in subpopulation 1.0 2 is the intercept 
scaling factor for the regression of CL (i.e., CUon CRCL in subpopulation II. 0 a is the regression coefficient for CRCL as a predictor of CL in subpopulation 1.0 4 is 
the regression coefficient seeling factor for CRCL as a predictor of CL in subpopulation II. (Excerpted from Etta and Ludden ) 


ased. A possible way of reducing bias in parameter esti¬ 
mates and of calculating realistic variances for them is 
to subject the data to the jackknife technique. 111 ' 20 The 
technique requires little by way of assumption or 
analysis. A naive Student t approximation for the 
standardized jackknife estimator 49 was used in the 
investigation. 

Most of the jackknife estimators for regression coeffi¬ 
cients and variability were similar to the NONMEM es¬ 
timates [Table I). In some cases, these were more pre¬ 
cise than the NONMEM estimates. The similarity of 
most of the final NONMEM parameter estimates to 
jackknife estimators indicated that the NONMEM esti¬ 
mates were relatively unbiased and precise. The differ¬ 
ence between the normal theory and jackknife esti¬ 
mates of G 2 was a consequence of the parameterization 
of the NONMEM model for CL. This parameter was 
used as the intercept scaling parameter for subpopula¬ 
tion II and, therefore, sensitive to changes in the jack¬ 
knife data sets when some subjects from subpopulation 
I (the smaller subpopulation) were not included in 
some of the JKK data sets. This is a limitation of the 
jackknife technique. The population pharmacokinetic 
estimates obtained with the final (irreducible) NON¬ 
MEM model were, therefore, found to be reliable. 

The reliability of parameter estimates can also be de¬ 
termined using the bootstrap approach (see Ette, 21 for 
instance, for the application of the bootstrap to popula¬ 
tion pharmacokinetic modeling). The use of the boot¬ 
strap for the determination of the reliability of parame¬ 
ter estimates is outside the scope of this article. 

Communication of Discovered Knowledge 

The population models developed for estimating clear¬ 
ance in the two subpopulations were used in designing 
two dosage guidelines for patients in the respective 
subpopulations. Patients with CRCL less than 50 ml/ 


min (subpopulation I) had their drug clearance values 
estimated with model I, while patients with CRCL 
greater than or equal to 50 ml/min had their drug clear¬ 
ance values estimated with model It. 

DISCUSSION 

In the example discussed, a hypothesis of bimodality 
in CL distribution was developed, which was further 
investigated and confirmed. In fact, this hypothesis 
formed the framework for the development of the pop¬ 
ulation pharmacokinetic model. The population phar¬ 
macokinetic model developed was used for designing 
dosage guidelines for the drug. It could also be used in 
clinical trial simulation for the planning of a subse¬ 
quent clinical trial with the drug. These, in essence, are 
avenues for applying the discovered pharmacokinetic 
knowledge. 

The use of population modeling for knowledge dis¬ 
covery from a pharmacokinetic/pharmacodynamic 
data set accumulated from dose-finding studies ad¬ 
dresses the shortcoming of traditional ways of analyz¬ 
ing such data through the discovery of interesting pat¬ 
terns. The shortcoming of traditional analysis of 
pharmacokinetic/pharmacodynamic data is that one 
does not know which questions to ask (hypothesis to 
test) to discover hidden knowledge. 

The process of knowledge discovery from a popula¬ 
tion pharmacokinetic data set (i.e., from data assembly 
to population pharmacokinetic model development) is 
an iterative and heterogeneous task, and it might be ar¬ 
gued that it is difficult to summarize the process in 
terms of the formalized steps described in this article. 
However, over the past 20 years, considerable knowl¬ 
edge has been accumulated on this subject; therefore, it 
has become necessary to delineate the steps taken in 
the process of knowledge discovery from a population 
pharmacokinetic data set. 


POPULATION PHARMACOKINETICS 


33 


PM3000456982 

Source: https://www.industrydocuments.ucsf.edu/docs/mmcl0001 



ETTEETAL 


CONCLUSION 

A structured approach for knowledge discovery from a 
large pharmacokinetic data set has been described. The 
approach described in this article lays out systemati¬ 
cally how hidden knowledge can be discovered from a 
population pharmacokinetic data set with the use of 
modern graphical, modeling, and statistical approaches. 
The use of these techniques for knowledge discovery 
from a large pharmacokinetic data set gives the data an¬ 
alyst the liberty' to choose statistical methodology ap¬ 
propriate to the problem at hand with the maximiza¬ 
tion of information extraction, rather than on the basis 
of mathematical/statistical tractability. 
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